Special issue on speech separation and recognition in multisource environments

نویسندگان

  • Jon Barker
  • Emmanuel Vincent
چکیده

One of the chief difficulties of building distant-microphone speech recognition systems for use in `everyday' applications is that the noise background is typically `multisource'. A speech recognition system designed to operate in a family home, for example, must contend with competing noise from televisions and radios, children playing, vacuum cleaners, and outdoors noises from open windows. Despite their complexity, such environments contain structure that can be learnt and exploited using advanced source separation, machine learning and speech recognition techniques such as those presented at the 1st International Workshop on Machine Listening in Multisource Environments (CHiME 2011).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Advances in audio source seperation and multisource audio content retrieval

Audio source separation aims to extract the signals of individual sound sources from a given recording. In this paper, we review three recent advances which improve the robustness of source separation in real-world challenging scenarios and enable its use for multisource content retrieval tasks, such as automatic speech recognition (ASR) or acoustic event detection (AED) in noisy environments. ...

متن کامل

The Munich 2011 CHiME Challenge Contribution: NMF-BLSTM Speech Enhancement and Recognition for Reverberated Multisource Environments

We present the Munich contribution to the PASCAL ‘CHiME’ Speech Separation and Recognition Challenge: Our approach combines source separation by supervised convolutive non-negative matrix factorisation (NMF) with our tandem recogniser that augments acoustic features by word predictions of a Long Short-Term Memory recurrent neural network in a multi-stream Hidden Markov Model. The performance of...

متن کامل

Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

This article proposes and evaluates various methods to integrate the concept of bidirectional Long Short-Term Memory (BLSTM) temporal context modeling into a system for automatic speech recognition (ASR) in noisy and reverberated environments. Building on recent advances in Long Short-Term Memory architectures for ASR, we design a novel front-end for contextsensitive Tandem feature extraction a...

متن کامل

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

Informing multisource decoding in robust automatic speech recognition

Listeners are remarkably adept at recognising speech in natural multisource environments, while most Automatic Speech Recognition (ASR) technology fails in these conditions. It has been proposed that this human ability is governed by Auditory Scene Analysis (ASA) processes, in which a sound mixture is segregated into perceptual packages, called ‘streams’, by a combination of bottom-up and top-d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer Speech & Language

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2013